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1.  Introduction 

Let  us  point  out  that  there  is  nothing  unexpected  in  this  paper.  The  sole  ele¬ 
ment  of  novelty  is  the  formal  description  of  a  simple  relation  between  a  chapter 
of  mathematical  logic  and  mathematical  statistics.  The  word  semantic  occur¬ 
ring  in  the  title  indicates  that,  roughly  speaking,  provability  or  nonprovability 
is  to  be  estimated  on  the  basis  of  truth  and  falsehood  in  interpretations  in  models. 
The  logical  formalism  used  in  this  paper  is  monadic  logic  introduced  by  P.  R. 
Halmos  in  [2].  In  principle  it  is  possible  to  replace  the  monadic  logic  by  a  more 
developed  formalism,  for  instance,  by  polyadic  logic  [3].  The  elements,  the  prov¬ 
ability  or  nonprovability  of  which  is  to  be  estimated,  as  well  as  the  interpreta¬ 
tions,  are  chosen  at  random  by  appropriate  chance  mechanisms,  hence  the  whole 
problem  is  probabilistic  in  nature.  The  estimation  procedures  established  in 
this  paper  possess  a  natural  optimum  property.  The  study  of  the  behavior  of 
these  procedures  at  infinity  shows  that  the  statistical  decision  functions  of  finite 
size,  which  estimate  provability  are,  in  fact,  asymptotically  good  approximate 
proofs.  One  may  hope  that  the  questions  treated  in  this  paper  reflect  at  least  the 
most  elementary  features  of  heuristic  reasoning  which  is  so  perfectly  realized  by 
the  human  brain. 

All  that  is  necessary  for  an  easy  understanding  is  developed  in  the  paper  in 
full  detail  and  with  intuitive  justification.  The  main  reason  is  that  one  cannot 
expect  that,  in  general,  specialists  in  mathematical  logic  are  familiar  with  con¬ 
cepts,  methods  and  results  of  statistical  decision  theory  or  that  statisticians  are 
familiar  with  formalisms  of  mathematical  logic. 

The  basic  concepts  and  results  of  statistical  decision  theory  on  an  appropriate 
level  of  generality  are  summarized  in  section  2.  These  results  are  then  applied 
in  section  3  to  the  problem  of  statistical  estimation  of  belonging  relations.  The 
passage  from  the  considerations  of  section  3  to  the  solution  of  our  main  problem 
of  statistical  estimation  of  provability  is  completely  transparent  and  forms  the 
contents  of  section  4. 

The  present  paper,  which  is  closely  connected  with  [8],  does  not  furnish  more 
than  may  be  intuitively  expected  and,  therefore,  its  practical  value  is  very 
limited.  Further  developments  in  this  direction,  however,  will  probably  throw 
some  light  into  the  mechanism  of  human  behavior  in  problem  solving. 
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2.  The  Neyman-Pearson  theorem 

A  wide  variety  of  problems  of  mathematical  statistics  can  be  reduced  to  a 
simple  application  of  a  classical  theorem  due  to  J.  Neyman  and  E.  S.  Pearson 
[5],  [6].  It  is  not  surprising  that  this  famous  theorem  plays  a  decisive  role  in  our 
considerations.  Its  original  version,  however,  does  not  fulfil  our  requirements. 
The  main  reason  is  that  it  does  not  allow  the  discussion  of  cases  in  which  more 
general  sample  spaces  occur.  We  shall  see  later  that  an  adequate  generalization 
of  the  Neyman-Pearson  theorem  can  be  easily  obtained. 

Our  basic  probability  space  will  be  denoted,  as  usual,  by  (12,  ©,  ^),  where  12  is 
the  set  of  elementary  events,  ©  the  sigma-algebra  of  random  events  and  m  the 
probability  measure  on  @.  The  symbol  co  will  always  mean  an  element  of  12. 
Throughout  this  paper  the  notation  just  introduced  will  be  preserved. 

A  statistical  decision  problem  is  defined  to  be  a  pair  (<p,  0  of  random  vari¬ 
ables,  where  (p  takes  its  values  in  the  parameter  space  and  £  ranges  over  the 
sample  space.  The  parameter  space  is  assumed  to  consist  of  exactly  two  ele¬ 
ments,  namely  0  and  1,  hence,  the  measurability  of  <p  is  assured  by  the  require¬ 
ment  that 

a>)  =  1}  G®. 

On  the  other  hand,  no  restriction  will  be  imposed  on  the  range  X  of  £  except 
that  it  is  supplied  with  a  fixed  sigma-algebra  X  of  subsets  of  X.  The  measurable 
space  (X,  X)  is  said  to  be  the  sample  space.  The  transformation  £  of  12  into  X 
will  be  called  a  random  sample  if 

{«:*(«)  G© 

for  every  set  E  from  X. 

Roughly  speaking,  a  statistical  decision  is  an  action  determined  by  the  value 
of  the  random  sample.  This  action  can  be  formally  described  using  the  concept 
of  decision  function.  The  domain  of  a  decision  function  is  the  sample  space  and 
its  range  is  usually  called  the  space  of  decisions.  In  our  case,  however,  the  space 
of  decisions  is  assumed  to  coincide  with  the  parameter  space,  hence,  a  decision 
function  8  is  a  function  defined  on  X  and  taking  the  values  0  or  1.  But  this  is 
not  enough.  In  order  to  ensure  that  the  compound  transformation  5[£(*)]  be¬ 
comes  a  random  variable,  it  is  reasonable  to  impose  on  8  an  additional  condition 
of  measurability,  namely, 

{x:8(x)  =  1}  G  I. 

A  natural  manner  of  how  to  evaluate  statistical  decisions  with  respect  to  the 
random  occurrence  of  parameters  is  the  convention  that 

({w:^(oj)  =  1}  n  {<*>:$[£(«)]  =  0})  U  ({c o:<p(o>)  =  0}  n  (co:5[£(w)]  =  1}) 

means  the  random  event  of  incorrect  decisions. 

Our  main  question  is  how  to  choose  the  decision  function  8  in  order  to  make 
the  probability  of  the  random  event  of  incorrect  decisions  as  small  as  possible. 
The  answer  is  quite  satisfactory. 
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Theorem  1.  There  always  exists  a  statistical  decision  f  unction  which  minimizes 
the  probability  of  the  random  event  of  incorrect  decisions. 

The  proof  is  a  simple  application  of  the  Hahn  decomposition  theorem  [4]. 
Let  us  write 

v{E)  =  {<*>•*>(<*>)  =  1}  n  irK#)]  —  m[{w:<p( w)  =  o}  n 

for  every  E  from  36.  Clearly,  v  is  a  signed  measure  on  36.  It  is  well  known  that 
there  exists  a  set  H  from  £  such  that  v(H  P\  E)  ^  0  and  v(H'  O  E)  ^  0  for 
every  2£  from  3£,  where  H'  =  X  —  H.  Since 

v{E)  =  k#  n  £)  +  k#  '  n  e)  g  k#)  +  K# '  n  ®)  s  K# ) 

for  every  2?  from  X,  hence  the  number  v(27)  is  the  maximum  of  v  on  36.  Now  let 
us  define  the  decision  function  0  by  the  requirement  that 

{x:/3(x)  —  1}  =  22. 

Since  for  every  decision  function  8  the  probability  of  the  random  event  of  in¬ 
correct  decisions  is  equal  to 

u{oji(p(o))  =  1}  —  =  1} 

hence,  using  the  fact  that  (3  is  determined  by  the  Hahn  decomposition  (27,  H ') 
of  v,  we  can  write 

o))  =  1}  —  v{xlft(x)  =  1}  ^  u{a):<p(o))  =  1}  —  v{x\8(x)  —  1} 
for  every  decision  function  5,  Q.E.D. 

The  decision  function  0,  whose  existence  is  assured  by  theorem  1,  is  said  to  be 
the  Bayes  solution  of  the  statistical  decision  problem  (<p,  £). 

It  is  easy  to  verify  that  the  signed  measure  v  is  absolutely  continuous  with 
respect  to  the  probability  measure  /dr1  in  36,  hence,  using  the  Radon-Nikodym 
theorem  [4],  we  can  state  that  there  exists  a  real  valued  measurable  function  h 
on  X  such  that 

v(E)  =  jE  h(x)  dpt1 

for  every  set  E  from  3 6.  We  see  at  once  that  the  set 

{x:h(x)  >  0} 

and  its  complement  determine  a  Hahn  decomposition  of  v  and  this  is  in  fact  the 
content  of  the  Neyman-Pearson  theorem.  It  is,  however,  more  appropriate  to 
formulate  this  theorem  in  terms  of  the  measurable  functions  h+  and  hr  defined 
for  every  element  x  of  X  and  every  set  E  from  36  by  the  equations 

=  1}  n  rW)  =  ol  J  h+(x)  d/d-1, 

E 

/*({«:*(«)  =  0}  n  £->(£))  =  (1  -  a)  /  h-(x)  d»tr\ 

E 

where 

a  =  /*{  col<p(oi)  =  1}. 
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The  number  a  is  said  to  be  the  a  priori  probability  in  the  parameter  space. 
Clearly,  if  a  >  0  then  h+  is  a  conditional  probability  density  and  if  a  <  1  then 
hr  is  a  conditional  probability  density.  Since 

p£-l{x:h(x)  =  ah+(x)  —  (1  —  a)h~(x)}  =  1, 

the  Neyman-Pearson  theorem  can  be  formulated  as  follows: 

Theorem  2.  The  statistical  decision  function  (3  determined  by  the  relation 

{xli 3(x)  =  1}  =  {xloch+(x)  >  (1  —  a)h~(x)} 

minimizes  the  'probability  of  the  random  event  of  incorrect  decisions. 

In  applications  of  this  theorem  the  densities  h+  and  h~  are  always  assumed  to 
be  known,  hence  the  Bayes  solution  /3  of  the  statistical  decision  problem  (<p,  £) 
depends  only  on  the  a  priori  probability  a  in  the  parameter  space. 

Now  we  shall  introduce  the  abstract  substitute  of  the  concept  of  sample  size. 
The  classical  model  shows  that  one  of  the  most  important  consequences  of  the 
reduction  of  sample  size  is  a  restriction  imposed  on  the  measurability  of  the 
decision  functions.  This  fact  motivates  our  definition  of  the  size  of  a  decision 
function. 

Let  36i,  £3,  *  •  •  be  a  nondecreasing  sequence  of  sigma-algebras  of  subsets 

of  X  and  suppose  that  the  union 

U  3 tn 

n  =  1 

is  a  base  of  the  sigma-algebra  36.  This  sequence  will  serve  as  a  scale  of  the  sizes 
of  decision  functions. 

The  decision  function  5  is  said  to  be  of  size  n  if  it  is  measurable  with  respect 
to  the  sigma-algebra  36n,  that  is,  if 

{x:8{x)  -  1}  G  *n 

but  it  is  not  measurable  with  respect  to  the  sigma-algebra  36m  for  m  = 
1,2,  •  •  •  ,  n  —  1.  We  shall  say  that  8  is  of  finite  size  if  there  exists  a  positive 
integer  n  such  that  8  is  of  size  n.  The  decision  function  5,  which  is  by  definition 
measurable  with  respect  to  the  whole  sigma-algebra  36,  is  said  to  be  of  infinite 
size  if  it  is  not  of  finite  size.  Clearly,  if  there  exists  a  decision  function  of  infinite 
size  then,  roughly  speaking,  the  scale  36i,  362, 363,  *  •  •  must  have  effectively  an 
infinite  number  of  divisions. 

Denoting  by  A  the  set  of  all  decision  functions  in  X  and  by  An  that  of  all 
decision  functions  in  X  at  most  of  size  n  for  n  =  1,  2,  3,  •  •  •  ,  we  see  at  once  that 

Ai  C  A2  C  a3  C  •  •  *  C  A, 

hence,  if  €  is  the  probability  of  the  random  event  of  incorrect  decisions  associated 
with  the  Bayes  decision  function  /S  from  A  and  en  that  associated  with  the  Bayes 
decision  function  /3„  from  An  for  n  =  1,  2,  3,  •  •  •  then 

€1  ^  €2  ^  €3  ^  ^  c 

that  is,  as  may  be  intuitively  expected,  the  least  probabilities  of  making  incor- 
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rect  decisions  do  not  increase  whenever  the  sizes  of  the  decision  functions 
admitted  to  the  concurrence  increase  to  infinity. 

By  theorem  2  a  Bayes  decision  function  0„  of  size  n  is  determined  by  the 
relation 

{x:pn(x)  =  1}  =  {x'.othHx)  >  (1  —  ot)hn  (x)} 

for  n  =  1,  2,  3,  •  •  •  ,  where  h£  and  hn  are  defined  using  the  sigma-algebra  £n  in 
the  same  way  as  h+  and  hr  were  defined  using  the  whole  sigma-algebra  I. 
The  main  effect  of  increasing  the  sample  size  can  be  expressed  as 
Theorem  3.  The  sequence  of  random  variables  hi  [^(-)],  W"  [£(•)]>  *  *  • 

converges  to  the  random  variable  h+[£(  • )]  with  probability  one,  the  sequence  of  random 
variables  Af  [£(*)]>  ^2"[£(*)L  ^3~[£(*)L  *  *  *  converges  to  the  random  variable  /&“[£(*)] 
with  probability  one,  and  the  sequence  e2,  €3,  •  •  •  of  probabilities  of  the  random 
event  of  incorrect  decisions,  associated  successively  with  the  Bayes  decision  functions 
0i,  02,  03,  •  •  *  converges  to  the  probability  e  of  the  random  event  of  incorrect  decisions 
associated  with  the  Bayes  decision  function  0. 

The  first  two  assertions  of  theorem  3  are  immediate  consequences  of  a  well- 
known  martingale  theorem  [1]  and  the  last  assertion  is  contained  in  [7]  as  a 
particular  case. 

Let  us  note  that  if  e  =  0  then  the  last  assertion  of  theorem  3  expresses  the 
well-known  property  of  consistency  of  the  Bayes  decision  functions  0i,  02,  03,  •  •  ♦  . 

3.  Statistical  estimation  of  belonging  relations 

A  wide  variety  of  questions  concerning  statistical  estimation  of  provability 
possesses  a  common  statistical  structure  of  very  elementary  nature  and  this 
fact  enables  us  to  treat  the  basic  statistical  problem  separately  and  independently 
of  any  consideration  belonging  purely  to  the  domain  of  mathematical  logic. 
After  establishing  the  general  results  it  remains  only  to  interpret  them  appro¬ 
priately  in  order  to  obtain  the  desired  final  answer  to  various  questions  of 
statistical  estimation  of  provability.  The  realization  of  this  last  step  is,  however, 
rather  only  a  routine  matter. 

Suppose  that  one  wants  to  decide  whether  an  element  chosen  at  random  by  an 
appropriate  chance  mechanism  from  a  fixed  set  A  belongs  or  does  not  belong  to  a 
fixed  nonempty  proper  subset  M  of  A. 

The  random  variable  rj  taking  values  in  A  is  assumed  to  be  a  formal  substitute 
of  our  basic  chance  mechanism.  One  of  the  most  natural  requirements  concerning 
measurability  is 

{co:t7(co)  £I}6@. 

The  direct  observation  on  M  is  replaced  by  observations  on  the  subsets  Q(m) 
of  A  for  m  =  1,  2,  3,  *  •  •  ,  hence,  it  is  also  natural  to  impose  on  rj  an  additional 
condition,  namely, 

(0:77(0)  G  Q(m)}  G  © 

for  m  =  1,  2,  3,  •  •  •  and  this  completes  the  definition  of  the  random  variable  77. 
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Now  let  r  be  an  ordinary  random  variable  taking  on  values  of  positive  integers. 
The  compound  transformation  Q[r(-)]  is  a  random  variable  in  the  sense  that 

(1)  {ulp  E  Q[r(«)]}  E  © 

for  every  element  p  of  A.  This  follows  from  the  obvious  identity 

{co:p  E  Q[t(«)]}  =  U  {co: r(co)  =  my}, 

j= i 

where  my  is  the  jth  positive  integer  for  which  p  E  Q(my). 

Clearly, 

{co:77(co)  E  Q[r(&>)]}  =  U  [{w:r?( «)  E  Q(™)/  O  {co :  r(co)  =  m}]; 

m  =  1 

hence  we  can  state  that 

(2)  {w:t?(co)  E  Q[t(w)]}  E  © 

and  this  is  the  most  important  fact  concerning  the  relation  between  the  two 
kinds  of  random  variables. 

In  elementary  set  theory  the  relation  p  E  M  is  often  expressed  in  terms  of  the 
characteristic  function  c  of  M  by  the  equivalent  statement  that  c(p)  =  1.  A 
slightly  more  complicated  concept  is  that  of  the  characteristic  function  of  a  ran¬ 
dom  set.  If,  for  each  m  =  1,  2,  3,  •  •  •  ,  c(m)  denotes  the  characteristic  function 
of  the  set  Q(m)  then  by  (1)  the  compound  function  c[r(*)]  is  an  ordinary  random 
variable  taking  the  values  1  or  0.  The  random  variable  c[r(*)]  is  said  to  be  the 
characteristic  function  of  the  random  set  Q[r(*)]«  The  element  p  of  A  belongs  to 
Q[t(g>)]  or  to  its  complement  A  —  Q[r(o>)]  according  as  c[r(co)](p)  =  1  or 
c[t(q>)](p)  =0. 

Clearly,  the  compound  transformation  c[t?(-)]  is  an  ordinary  random  variable 
taking  the  values  1  or  0.  The  value  of  rj  at  co  belongs  to  M  or  to  its  complement 
A  —  M  according  as  c[rj(w)]  =  1  or  c[r}(o))]  =  0.  By  (2)  the  compound  transfor¬ 
mation  c[r(  •)][*?(*)]  is  an  ordinary  random  variable  taking  the  values  1  or  0. 
The  value  of  rj  at  co  belongs  to  Q[r(w)]  or  to  its  complement  A  —  Q[r(o>)]  accord¬ 
ing  as  c[r( co)][r?(o;)]  =  1  or  c[r(co)]h(o;)]  =  0.  We  have  thus  defined  a  probabil¬ 
istic  extension  of  belonging  relations. 

In  order  to  simplify  the  notation  we  shall  write  <p(-)  instead  of  c[r;(*)]  and 
x(-)  instead  of  c[r(- )][>(•  )]• 

Let  X  be  the  set  of  all  sequences  x  =  (xi,  x2,  x3,  •  •  • )  every  term  of  which  is 
either  equal  to  1  or  0.  Coincidence  in  the  first  n  terms  of  sequences  from  X  is  an 
equivalence  relation  in  X.  The  class  3£n  of  all  unions  of  equivalence  sets  induced 
by  this  equivalence  relation  is  a  complete  algebra  of  subsets  of  X  for  every 
n  —  1,  2,  3,  *  *  •  .  The  sets  from  HLn  are  called  n-dimensional  cylinders.  Our 
basic  sigma-algebra  3c  of  subsets  of  X  is  that  induced  by  the  union 
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Let  n,  r2,  r3,  •  •  •  be  a  sequence  of  integral-valued  random  variables.  Then 
the  sequence 

£  =  (xi,  X2,  X8,  •  •  •), 

where  x«(*)  =  c[r„(  •)][*?(*)]  forn  =  1,  2,  3,  •  •  •  ,  is  a  random  vector  with  values 
in  X.  Clearly,  £  is  the  smallest  sigma-algebra  of  subsets  of  X  for  which  the 
vector  £  is  measurable. 

Now  the  ground  is  prepared  to  put  the  traditional  machinery  of  statistical 
decision  functions  into  action.  The  passage  from  the  general  scheme  of  statistical 
decision  to  our  particular  case  is  very  simple  because  the  notation  of  section  2  is 
preserved.  As  has  already  been  pointed  out  in  section  2,  the  Bayes  solution  of  a 
statistical  decision  problem  depends  on  the  a  priori  probability  in  the  parameter 
space.  We  shall  see,  however,  that,  as  compared  with  the  general  case,  our  par¬ 
ticular  version  of  the  statistical  decision  problem  is,  roughly  speaking,  less  sensi¬ 
tive  to  the  exact  knowledge  of  the  a  priori  probability,  provided  that  a  very 
simple  and  natural  condition,  namely, 

(3)  M  C  Q(m) 

for  m  =  1,  2,  3,  *  •  *  ,  is  satisfied.  We  shall  see  that  under  this  condition  either 
the  decision  function  which  associates  with  every  sample  point  x  of  X  the 
decision  0,  or  the  decision  function  fin  which  associates  with  the  sample  point  x 
of  X  the  decision  1  or  0  according  as  the  first  n  coordinates  of  x  are  equal  to  1 
or  at  least  one  of  these  coordinates  is  equal  to  0,  can  occur.  More  precisely 

Theorem  4.  Under  (3)  the  Bayes  solution  of  size  n  of  the  statistical  decision 
problem  (<p,  £)  is  determined  by  the  decision  function  /3n  or  fit  and  the  probability  of 
the  random  event  of  incorrect  decisions  is  equal  to 

(4)  (1  Oi)hn  (1,  1,  *  *  *  ,  1,  Xn+ 1,  Xn+2)  *) 

or  to  a  according  as 

OL 

(5)  hn  (1,1,  *  ,  1,  3?n4-l>  *)  ^  1 

I  —  a 

or  the  opposite  inequality  holds. 

The  details  of  the  proof  can  be  omitted  because  theorem  4  is  nothing  else 
but  a  particular  version  of  theorem  2.  It  suffices  to  note  that,  as  compared  with 
theorem  2,  the  main  simplification  arises  from  (3)  and  from  the  definition  of  X , 
9 in  and  £.  Under  these  conditions  hi  (x)  =  1  or  0,  according  as  the  first  n  co¬ 
ordinates  of  x  are  equal  to  1  or  one  at  least  of  these  coordinates  is  equal  to  0, 
and  0  ^  hn  (x)  g  1  for  every  x  from  X,  hence  theorem  2  is  immediately  ap¬ 
plicable. 

In  order  to  make  the  intuitive  content  of  the  theorem  just  established  more 
transparent  we  shall  give  the  informal  description  of  an  experimental  procedure 
of  how  to  estimate  that  an  element  of  A  chosen  by  rj  belongs  to  M  or  to  its 
complement  A  —  M  using  the  Bayes  decision  procedure  of  size  n}  that  is,  that 
determined  by  the  random  variables  tx,  r2,  •  •  •  ,  r«.  Whenever  the  inequality  (5) 
does  not  hold  then  the  value  of  tj  is  always  estimated  to  belong  to  A  —  M. 
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If  (5)  holds  then  the  decision  procedure  runs  as  follows:  At  the  first  step  we 
choose  the  set  Q(m)  determined  by  the  value  of  n.  If  the  value  of  rj  does  not 
belong  to  this  set,  the  procedure  is  stopped  and  the  value  of  rj  is  estimated  to 
belong  to  A  —  M.  In  the  opposite  case  we  continue  the  inspection  choosing  the 
set  Q(m)  determined  by  the  value  of  7-2.  If  the  value  of  rj  does  not  belong  to  this 
set,  the  procedure  is  stopped  and  the  value  of  y\  is  estimated  to  belong  to  A  —  M. 
In  the  opposite  case  we  continue  the  inspection  choosing  the  set  Q{m)  deter¬ 
mined  by  the  value  of  r3  and  so  on.  Exhausting  all  the  sets  Q(m)  determined 
successively  by  the  values  of  n,  r2,  •  •  •  ,  rn  without  reaching  the  decision  that 
the  value  of  rj  belongs  to  A  —  M  we  accept  the  decision  that  the  value  of  77 
belongs  to  M .  We  see  that  the  final  decision  that  the  value  of  rj  belongs  to 
A  —  M  can  be  reached  at  every  step  of  the  decision  procedure.  On  the  other 
hand,  the  opposite  decision  that  the  value  of  rj  belongs  to  M  can  be  reached 
only  at  the  last  step. 

Now  we  shall  show  that  under  the  two  additional  conditions 
(6)  H  Q(m)  =  M, 

m  =  1 


(7)  m  [  O  U  {co:t„(o>)  =  m }]  =  1, 

m  —  1  n  =  1 

the  Bayes  solution  of  the  statistical  decision  problem  (<p,  £)  becomes  asymptot¬ 
ically  independent  of  the  a  priori  probability  a. 

Roughly  speaking,  condition  (6)  together  with  (3)  express  the  natural  require¬ 
ment  that  the  approximation  of  M  by  the  successive  intersections  of  the  sets 
Q(m )  can  be  arbitrarily  close  and  the  condition  (7)  means  that  the  sequence 
n,  r2,  t3,  •  •  •  exhausts  with  probability  one  the  whole  set  of  positive  integers. 

For  instance,  condition  (7)  is  satisfied  whenever  the  integer  valued  random 
variables  n,  r2,  r3,  •  •  •  are  mutually  independent,  identically  distributed,  and 
such  that 

ju{co:ri(co)  =  m)  >  0 

for  m  =  1,  2,  3,  •  •  •  . 

Clearly,  under  the  last  condition, 

M  [  Q1  (to:r„(w)  5^  m}  J  =  £/i{«:Ti(«)  5^  m}  J*, 


for  £  =  1,2,  3, 


/x{o>:ti(w)  7*  m}  <1, 

m  =  1,  2,  3,  •  •  •  ;  hence, 

M  £  7*  m}  J  =  0 


for  m  =  1,  2,  3,  •  •  •  ,  that  is, 


mT  U  n  m}  =0 

Lm  =  1  7i  =  l 


or,  equivalently,  (7)  holds. 
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Our  theorem  4  can  be  completed  as 

Theorem  5.  If  a  >  0  and  the  conditions  (3),  (6),  (7)  are  satisfied  then  there 
exists  a  'positive  integer  k  such  that  j3n  is  the  Bayes  solution  of  size  n  of  the  statistical 
decision  problem  ( <p ,  £)  whenever  n  >  k. 

Since  by  theorem  3  hn  [£(  • )]  — >  /&“[£( • )]  with  probability  one  as  n  — ►  oo ,  hence, 
by  theorem  4  and  by  the  assumption  a  >  0  of  theorem  5  it  suffices  to  show 
that  h~(  1, 1,1,  •  •  •)  =0,  that  is,  that 

(8)  v  a>)  G  A  —  M}  n  G  Q[^n(co)]^  =  0. 

In  order  to  simplify  the  notation  we  shall  write 

00  00 

n  U  {«:r„(w)  =  m)  =  G. 

m**l  n  =  l 

It  follows  from  (6)  that 

G  n  (co:^(co)  G  A  —  M}  n  ^co  117(a))  G  0[t»(«)]^  =  0, 

hence 

m  n  {w:i?(o))  G  A  —  AT}  n  17(a))  G  Q[t»(«)]  =  0, 

and  since  by  (7)  y(G)  =  1,  we  obtain  (8),  Q.E.D. 

Let  us  denote  by  fi  the  decision  function  which  associates  with  every  sam¬ 
ple  point  x  from  X  the  decision  1  or  0,  according  as  x  =  (1,  1,  1,  •  •  •)  or 
x  (1, 1,1,  •  •  •)•  By  theorem  2,  fi  is  the  Bayes  solution  of  the  statistical  deci¬ 
sion  problem  ( <p ,  £)  with  respect  to  the  whole  sigma-algebra  X,  hence  it  is  of 
infinite  size.  Since  the  probability  of  the  random  event  of  incorrect  decisions  is 
equal  to  zero,  the  decision  function  /3,  in  fact,  becomes  a  proof  that  the  value 
of  17  belongs  to  M  or  to  its  complement  A  —  M. 

Now  we  shall  introduce  a  function  l  on  12  whose  values  are  positive  integers 
or  00  as  follows: 

(co:Z(a))  =  1}  =  (o):xi(co)  =  0}, 

n  — 1 

(o):Z(a))  =  n}  =  {a):xn(w)  =  0}  n  O  {^:xy(w)  =  1} 

y=i 

for  n  =  2,  3,  4,  •  •  •  and 

* 

(o):Z(a))  =  00}  =  n  (w:xn(a))  =  1}. 

n  =  l 

We  see  at  once  that  l  is  an  ordinary  random  variable,  provided  that  the  defini¬ 
tion  is  modified  in  such  a  way  that  the  possibility 

/jl{co:1( co)  =  00}  >  0 

is  not  excluded. 

The  random  variable  l  is  said  to  be  the  length  of  the  decision  function  f3. 
Theorem  6.  If  conditions  (3),  (6)  and  (7)  are  satisfied  then  the  length  of  the 
decision  function  is  infinite  with  conditional  probability  one  under  the  condition 
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that  the  value  of  y  belongs  to  M  and  it  is  finite  with  conditional  probability  one  under 
the  condition  that  the  value  of  17  belongs  to  A  —  M. 

By  the  definition  of  l, 

/u[{co:?(a>)  =  00}  |  {a>: j/(cd)  £  M }] 

=  i  M  (  W<o)  G  M}  n  G  n^Mw)]})- 

Using  the  conditions  (3),  (6),  and  (7),  we  see  that  (8)  holds,  hence  the  first 
assertion  of  theorem  6  is  an  immediate  consequence  of  (8).  Since,  in  addition, 

/i[{co:Z(w)  =  oo}|{co:t?(co)  G  A  -  M}] 

=  eA-M}n  -(co^Cco)  e  n^)]}} 

the  second  assertion  of  theorem  6  follows  at  once  from  (8) . 

Let  us  note  that  under  the  assumptions  of  the  theorem  just  proved  it  is  not 
true  that  the  conditional  moments  of  l  under  the  condition  that  the  value  of  t\ 
belongs  to  A  —  M  are  finite,  that  is,  the  analogue  of  theorem  2  in  [8]  does  not 
hold.  This  disadvantage,  however,  can  be  removed  by  adding  further  restrictive 
conditions. 

4.  Semantic  concepts 

The  statistical  decision  problem  of  section  3  is  based  on  observations  on  the 
sets  Q(l),  Q( 2),  Q(3),  •  •  •  which  replace  the  direct  observation  on  M.  The  most 
natural  way  of  how  to  get  the  sets  Q(l),  Q{ 2),  Q(3),  •  •  •  is  the  effect  of  reduction 
of  resolving  power  in  A  which  can  be  formally  described  by  an  appropriate  ap¬ 
plication  of  equivalence  relations. 

A  binary  relation  R  in  the  set  A  is  said  to  be  an  equivalence  relation  if  it  is 
reflexive,  symmetric  and  transitive.  Every  equivalence  relation  in  A  induces  a 
partition  of  A  into  equivalence  sets  and  vice  versa.  Two  elements  p,  q  of  A 
belong  to  the  same  equivalence  set  if  and  only  if  pRq.  The  equivalence  relation  S 
in  A  is  said  to  be  finer  than  R  and  we  shall  write  S  <  R  if  pSq  implies  pRq  or, 
in  other  words,  if  every  equivalence  set  induced  by  S  is  included  in  at  least  one, 
hence  in  exactly  one,  equivalence  set  induced  by  R.  Clearly,  the  set  of  all  equiv¬ 
alence  relations  in  A  is  partially  ordered  by  <  and  the  identity  I  is  the  finest 
equivalence  relation. 

The  formal  description  of  reduction  of  resolving  power  by  equivalence  rela¬ 
tions  is  intuitively  justified  by  the  convention  that  two  elements  of  A  which 
belong  to  the  same  equivalence  set  cannot  be  distinguished.  Under  this  conven¬ 
tion  it  is  reasonable  to  introduce  the  concept  of  closure  MR  of  M  induced  by 
the  equivalence  relation  R ,  requiring  that 

Mr  =  U  {p'pRq}- 

q€M 
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Clearly,  M1  =  M ,  that  is,  the  application  of  the  identity  I  on  M  has  no  effect, 
and  MR  C  Ms,  whenever  R  <  S. 

Let  Ri,  R2,  R$j  •  •  •  be  a  sequence  of  equivalence  relations  in  A .  Putting 

MRn  =  Q(n) 

for  n  =  1,  2,  3,  •  •  •  ,  we  see  that,  in  fact,  the  decision  problem  of  section  3  is 
based  on  observations  at  reduced  resolving  power.  This  artificial  reduction  of 
resolving  power  is  justified  by  the  fact  that,  in  general,  MR  has  a  simpler  struc¬ 
ture  than  Ms ,  whenever  S  <  R. 

The  application  of  the  elementary  facts  summarized  in  section  3  to  our  main 
question  of  statistical  estimation  of  provability  by  interpretations  in  models 
requires  a  number  of  restrictions  which  must  be  imposed  on  the  sets  A  and  M 
and  on  the  equivalence  relations  Rh  R2)  Rz,  •  •  •  . 

First  of  all  we  shall  suppose  that  A  is  a  Boolean  algebra.  As  usual,  we  shall 
denote  by  0  and  1  the  zero  and  unity  of  A,  by  p '  the  complement  of  the  element 
p  of  Aj  by  A  the  operation  of  forming  the  greatest  lower  bound,  and  by  V  that 
of  forming  the  least  upper  bound  in  A. 

In  order  to  eliminate  misunderstandings  we  recall  that  the  subset  M  of  A 
is  said  to  be  a  Boolean  ideal  in  A  if  it  contains  the  greatest  lower  bound  of  any 
two  of  its  elements  as  well  as  the  least  upper  bound  of  any  two  elements  of  A 
one  at  least  of  which  belongs  to  M.  The  algebraic  structure  just  defined  is 
usually  called  dual  Boolean  ideal.  We  shall,  however,  omit  the  suffix  dual  be¬ 
cause  no  other  ideals  will  occur  in  this  paper. 

The  relation  R  defined  in  A  is  said  to  be  a  Boolean  congruence  relation  if  it 
is  an  equivalence  relation  which,  in  addition,  satisfies  the  condition 

(9)  pRq  implies  p'  V  rRq'  V  r. 

The  simplest  algebraic  structure,  which  enables  the  treatment  of  propositional 
functions  of  mathematical  logic  and  for  which  the  concept  of  interpretation  is 
natural,  is  that  of  monadic  algebra  introduced  by  P.  R.  Halmos  [2]. 

A  monadic  algebra  is  a  Boolean  algebra  A  together  with  an  operator  V  which 
assigns  to  every  element  p  of  A  an  element  Vp  of  A  in  such  a  way  that 

Vl  =  1, 

Vp  ^  p 

for  every  element  p  of  A,  and 

V(p  V  Vq)  =  Vp  V  Vq 

for  every  p  and  q  in  A.  The  operator  V  is  said  to  be  the  universal  quantifier  in  A. 

A  subset  M  of  a  monadic  algebra  A  is  said  to  be  a  monadic  ideal  in  A  when¬ 
ever  M  is  a  Boolean  ideal  in  A  and 

pGM  implies  Vp  £  M. 

A  monadic  ideal  M  in  the  monadic  algebra  A  is  called  maximal  if  it  is  proper, 
that  is,  if  M  7*  A  and  M  is  not  a  proper  subset  of  any  other  proper  ideal  in  A. 
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The  relation  ft  defined  in  the  monadic  algebra  A  is  said  to  be  a  monadic  con¬ 
gruence  relation  if  it  is  a  Boolean  congruence  relation  and  if,  in  addition, 

(10)  pRq  implies  VpftVg. 

A  monadic  congruence  relation  R  in  the  monadic  algebra  A  is  called  simple 
whenever  the  monadic  residual  class  algebra  A  (ft)  of  A  modulo  R  is  simple,  that 
is,  when  there  is  no  proper  monadic  ideal  in  A  other  than  that  containing  the 
sole  element  1. 

The  relevant  properties  of  closures  of  monadic  ideals  will  be  expressed  by  the 
following  lemma: 

If  M  is  a  monadic  ideal  and  R  a  monadic  congruence  relation  in  the  monadic 
algebra  A  then  the  closure  MR  of  M  induced  by  R  is  a  monadic  ideal  in  A.  If \  in 
addition ,  R  is  simple  then  either  MR  =  A  or  MR  is  maximal. 

We  shall  first  show  that  MR  is  a  Boolean  ideal  in  A.  Let  p  £  MR  and  q  £  A. 
By  the  definition  of  the  closure  MR  of  M ,  there  exists  an  element  rp  of  M  such 
that  rpRp.  By  (9)  rfv  V  Oftp'  V  0,  that  is,  rpRpf.  Hence  rp  V  qRp  V  q.  Since  M 
is  a  Boolean  ideal  in  A ,  we  have  rv  V  q  £  M ,  hence  p  V  q  £  MR.  Now  let  us 
suppose,  in  addition,  that  q  £  MR.  Then  there  exists  an  element  rq  of  M  such 
that  rqRq.  By  (9)  we  have 

rp  V  rjftp'  V  r£,  p'  V  r'qRpf  V  q', 

and  hence 

(r'p  V  rQf  V  0 ft(p'  V  r'q)’  V  0, 

(p'  V  r'q)'  V  0 ft(p'  V  q')'  V  0 

or,  equivalently, 

rp  A  rqRp  A  rq , 
p  A  rqRp  A  q , 

and,  using  the  transitivity  of  ft,  we  obtain 

rp  A  rqRp  A  g. 

Since  M  is  a  Boolean  ideal  in  A  we  have  rp  A  rq  £  M,  hence,  p  A  qG  MR.  We 
see  that  MR  is  a  Boolean  ideal  in  A.  Now  it  remains  to  show  that  MR  is  a  monadic 
ideal  in  A,  that  is,  that  Vp  £  MR  whenever  p  £  AF.  Since  rvRp ,  it  follows  from 
(10)  that  yrpRyp,  hence,  using  the  assumption  that  M  is  a  monadic  ideal  in  A, 
we  have  Vrp  £  M  and,  consequently,  Vp  £  MR.  This  completes  the  proof  of  the 
first  part  of  our  lemma.  If  R  is  simple  then,  by  the  definition  of  simplicity,  the 
class  of  all  congruence  sets  which  have  a  nonempty  intersection  with  M  is  equal 
either  to  A(R)  or  to  the  monadic  ideal  {1}  in  A  (ft),  hence,  either  MR  =  A  or 
MR  =  {p:pftl},  Q.E.D. 

A  monadic  logic  is  a  pair  (A,  M),  where  A  is  a  monadic  algebra  and  M  is  a 
monadic  ideal  in  A.  The  monadic  logic  (A,  ilf)  represents  a  deductive  theory 
in  A.  The  elements  of  A  which  belong  to  M  are  called  provable.  If  ft  is  a  simple 
congruence  relation  in  A  then  the  closure  M R  of  M  induced  by  ft  is  said  to  be  an 
interpretation  of  M  in  the  model  A  (ft).  If  an  element  p  of  A  belongs  to  the 
interpretation  of  M  we  shall  say  that  p  is  true  in  that  interpretation  and 
otherwise  that  it  is  false. 
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The  monadic  logic  (A,  M )  is  said  to  be  semantically  consistent  if  there  exists 
at  least  one  interpretation  of  M  in  a  model. 

Since  M  C  MR,  we  can  state  that  a  provable  element  of  A  is  true  in  every 
interpretation.  Whenever  the  opposite  conclusion  is  possible  then  the  monadic 
logic  (A,  M),  is  called  semantically  complete.  More  precisely,  the  monadic  logic 
( A ,  M)  is  said  to  be  semantically  complete  if  M  is  equal  to  the  intersection  of 
all  its  interpretations. 

For  our  purposes,  however,  a  restricted  version  of  semantic  completeness  is 
more  appropriate.  Let  Q  be  a  class  of  interpretations  of  M.  The  monadic  logic 
(A}  M)  is  said  to  be  semantically  Q-complete,  whenever 

m=  n  q. 

qco 

In  order  to  eliminate  degenerate  cases  it  is  natural  to  assume  that  the  monadic 
logic  ( A ,  M)  is  semantically  consistent  and  semantically  O-complete.  Clearly, 
the  assumption  of  semantic  consistency  can  be  replaced  by  M  ^  A  and,  by  our 
lemma,  there  is  no  restriction  of  generality  if  we  assume  that  the  interpretations 
from  O  are  maximal  monadic  ideals. 

The  estimation  of  provability  or  nonprovability  of  elements  of  a  monadic 
logic  is  based  upon  the  inspection  of  its  truth  or  falsehood  in  interpretations  in 
models.  Since  to  each  interpretation  Q  from  O  there  corresponds  a  simple 
monadic  congruence  relation  Rq  such  that  Q  =  MRq,  the  idea  of  artificial  reduc¬ 
tion  of  resolving  power  by  simple  monadic  congruence  relations  is  justified  by 
the  fact  that,  by  the  lemma,  the  induced  closures  are  maximal  monadic  ideals 
which  evidently  have  an  extremely  simple  algebraic  structure. 

The  application  of  the  results  established  in  section  3  to  the  question  of 
statistical  estimation  of  provability  in  monadic  logic  requires  a  further  restric¬ 
tion,  namely,  that  Q  is  denumerable.  In  this  case  we  can  write 

0=  {Q(1),Q(2),Q(3),  •••}. 

The  random  variable  rj  chooses  an  element  of  the  monadic  algebra  A ,  the 
provability  or  nonprovability  of  which  is  to  be  estimated  on  the  basis  of  in¬ 
terpretations  of  M  chosen  from  O  by  the  random  variables  n,  r2,  •  •  *  ,  r„. 

One  may  intuitively  expect  that  the  following  decision  procedure  is  the  most 
favorable  one.  At  the  first  step  we  choose  the  interpretation  Q(m)  determined  by 
the  value  of  n.  If  the  value  of  rj  is  false  in  this  interpretation,  the  procedure  is 
stopped  and  the  value  of  rj  is  estimated  to  be  nonprovable.  In  the  opposite  case 
we  continue  the  inspection  choosing  the  interpretation  Q(m)  determined  by  the 
value  of  t2.  If  the  value  of  t)  is  false  in  this  interpretation,  the  procedure  is  stopped 
and  the  value  of  rj  is  estimated  to  be  nonprovable.  In  the  opposite  case  we  con¬ 
tinue  the  inspection  choosing  the  interpretation  Q(m)  determined  by  r3  and  so 
on.  Exhausting  all  the  interpretations  Q(m)  determined  successively  by  the 
values  of  n,  r2,  ■  *  •  ,  r„  without  reaching  the  decision  that  the  value  of  rj  is 
nonprovable  we  accept  the  decision  that  the  value  of  rj  is  provable. 

In  fact,  the  decision  procedure  just  described  minimizes  the  probability  of 
making  an  incorrect  decision  only  if  the  a  priori  probability  a  that  rj  chooses  a 
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provable  element  of  A  is  sufficiently  large.  Whenever  a  is  small  then  the  de¬ 
generate  decision  procedure  which  always  estimates  the  value  of  77  to  be  non- 
provable  is  better.  The  exact  discrimination  between  these  two  decision  pro¬ 
cedures  is  contained  in  theorem  4. 

If  the  a  priori  probability  a  is  positive,  if  the  values  of  the  random  variables 
ti,  t2,  t3,  •  •  •  exhaust  with  probability  one  the  whole  set  of  positive  integers,  and 
if  the  monadic  logic  ( A ,  M)  is  G-complete  then,  by  theorem  5,  for  a  sufficiently 
large  number  of  interpretations  to  be  inspected,  the  nondegenerate  estimation 
procedure  is  the  most  favorable  one  in  the  sense  that  the  probability  of  making 
an  incorrect  estimate  becomes  a  minimum.  Let  us  note  that  the  condition  of 
semantic  consistency  is  in  this  case  always  fulfilled  automatically  whenever 
a  >  0  and  (A,  M)  is  semantically  G-complete. 

In  the  language  of  monadic  logic  the  decision  function  p  of  infinite  size  occur¬ 
ring  in  theorem  6  is  said  to  be  the  heuristic  reasoning  about  the  element  of  A 
chosen  by  77  and  the  random  variable  l  is  called  the  length  of  the  heuristic 
reasoning  p. 

The  content  of  theorem  6  can  be  expressed  as  follows:  If  a  >  0,  if  the  values 
of  the  random  variables  n,  r2,  r3,  •  •  •  exhaust  with  probability  one  the  whole  set 
of  positive  integers  and  if  the  monadic  logic  (A,  M)  is  semantically  G-complete, 
then  the  length  of  the  heuristic  reasoning  about  the  value  of  77  is  infinite  with 
conditional  probability  one  under  the  condition  that  a  provable  element  of  A 
has  been  chosen  by  77  and  it  is  finite  with  conditional  probability  one  under  the 
condition  that  the  element  of  A  chosen  by  77  was  nonprovable. 

Clearly,  only  the  last  assertion  is  practically  effective  because  only  nonprov¬ 
ability  can  be  discovered  after  a  finite  number  of  steps.  On  the  other  hand,  this 
pessimistic  opinion  concerning  heuristic  reasoning  is  weakened  by  the  fact  that 
if  provability  is  estimated  then  this  result  is  asymptotically  good. 
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